Modeling phone correlation for speaker adaptive speech recognition

نویسندگان

  • Baojie Li
  • Keikichi Hirose
  • Nobuaki Minematsu
چکیده

Information of phone relationships is regarded as acting an important role in speech recognition. It has been successfully exploited in many speaker adaptation approaches. In this paper, we propose a new approach, named Phone Pair Model (PPM) re-scoring, to utilize phone relationships for speaker-adaptive speech recognition. PPM re-scoring approach does not really adapt model parameters to a new speaker. It just uses some pre-registered phones' samples from the speaker being recognized, to re-calculate the likelihood of phones that has been calculated on conventional phone HMMs, resulting in a more correct recognition result. Additionally, it can deal with not only inter-speaker acoustic variations but also intra-speaker acoustic variations adequately. Results of two recognition experiments, one using phone HMMs only and the other incorporating phone HMMs with the PPMs, showed that even by using only a few vowel samples as the pre-registered phones, PPM re-scoring approach brought an increase in recognition rate .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

A comparison of normalization and training approaches for ASR-dependent speaker identification

In this paper we discuss a speaker identification approach, called ASR-dependent speaker identification, that incorporates phonetic knowledge into the models for each speaker. This approach differs from traditional methods for performing textindependent speaker identification, such as global Gaussian mixture modeling, that typically ignore the phonetic content of the speech signal. We introduce...

متن کامل

A Comparison of Normalization and Training Approaches for ASR-Dependent Speaker Identification1

In this paper we discuss a speaker identification approach, called ASR-dependent speaker identification, that incorporates phonetic knowledge into the models for each speaker. This approach differs from traditional methods for performing textindependent speaker identification, such as global Gaussian mixture modeling, that typically ignore the phonetic content of the speech signal. We introduce...

متن کامل

Phone Adaptive Training for Speaker Diarization

The linguistic content of a speech signal is a source of unwanted variation which can degrade speaker diarization performance. This paper presents our latest work to reduce its impact. The new approach, referred to as Phone Adaptive Training (PAT), is analogous to speaker adaptive training used in automatic speech recognition. We report an oracle experiment which shows that PAT has the potentia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000